class: center, middle, inverse, title-slide .title[ # Investigating COVID and the Retail Industry ] .author[ ### Sean Bellevue, Brooke Cummins, Jordan Gropper, Andy Turner ] --- # Contents - How was COVID affected the health of the retail industry, as measured by employment? - Exploring Data - Why we are running this analysis - Assumptions - Testing - Results - Interpretation - How has retail fared relative to other industries? - Assumption - Testing - Results - Interpretation - Retail needs to worry about who has money to spend - what has changed about who is working and earning money? - Assumption - Testing - Results - Interpretation --- #Question 1 - How has COVID affected the health of the retail industry, as measured by employment? --- # Data Exploration - Large spike in early 2020 of Unemployment - Significant portion of observations are retail employees - Retail vs non-retail similar trends of unemployment over time  --- # Why are we running these analyses? - Colloquially and in media, certain industries ‘hit hard’ by COVID - Retail requires employees to be ‘in person’ in order to survive as a business - Employment is a good proxy for determining industry health - COVID is a radical change - we can observe changes over time - We want to determine how probable the onset of COVID and the accompanying public health policy changes impacted employment for the retail industry - using employment as our dependent variable --- # Question 1 Assumptions - Filter observations to only those in retail - COVID ‘begins’ on April 1, 2020 - LOGIT model will help answer question best - Multiple linear regression not the best solution to answer the question - Incidental parameter --- # Question 1 Regression Testing - Model 1: ```r q1_m1 <- feols(Unemployed ~ YEAR + MONTH + covid1_dummy + covid2_dummy + covid3_dummy, data = q1_df, weights = q1_df$WTFINL, vcov = 'hetero') ``` - Model 2: ```r q1_m2 <- feols(Unemployed ~ YEAR*Covid_April + YEAR + MONTH + REGION + covid1_dummy + covid2_dummy + covid3_dummy, data = q1_df, weights = q1_df$WTFINL, vcov = 'hetero') ``` - Model 3: our chosen model that ultimately answers question 1 ```r q1_m3 <- feglm(Unemployed ~ YEAR*Covid_April + YEAR + MONTH + REGION + covid1_dummy + covid2_dummy + covid3_dummy, data = q1_df, weights = q1_df$WTFINL, family = binomial(link = 'logit')) ``` --- # Question 1 Results - Model Accuracy Chart  --- # Question 1 Results - Marginal Effects Summary  - Results of Before and During (After April 1, 2020)  --- # Interpretation: Question 1 Based on this regression and analysis, we see that pre-COVID (before April 1st, 2020), we would anticipate an unemployment rate of 3.69%, and post-COVID, we anticipate an unemployment rate of 11.54% which is a massive spike. This aligns much more closely to what we saw in our exploratory data analysis charts.  --- #Question 2 - How has retail fared relative to other industries? - Retail has struggled with consumer demand during covid, which has in turn caused employees to lose hours or their jobs:  --- # Why are we running these analyses? - As the images on the last slide portrayed, all industries suffered from the covid layoffs, however, I wanted to compare and contrast retail against all other industries in general. - Dummy variable for retail or non retail and Dummy for April before/after - Model #1 is just a generalized multivariate model, which progresses into a logit model. --- # Question 2 Assumptions - April Assumption: Job loss takes time - Assumptions about other industries: No massive outliers - Assumption about employment type: The largest impact on an industry is a layoff - Incidental Parameter: Assumption about multiple Logit Fixed Effects. --- # Question 2 Regression Testing - Model 1: General Multivariate ```r q2_m1 <- feols(unemper ~ retail_or_not + YEAR + MONTH | REGION + covid1_dummy + covid2_dummy + covid3_dummy, data = ques2, weights = ques2$WTFINL, vcov='hetero') ``` - Model 2: Binary Dependent Logit ```r q2_m2 <- feglm(unemper ~ retail_or_not + YEAR + MONTH | REGION + covid1_dummy + covid2_dummy + covid3_dummy, data = ques2, weights = ques2$WTFINL, family = binomial(link = 'logit')) ``` - Model 3: Interaction Term Logit ```r q2_m3 <- feglm(unemper ~ retail_or_not*covid_april + YEAR + MONTH + covid1_dummy + covid2_dummy + covid3_dummy + AGE + SEX + RACE + REGION, data = ques2, weights = ques2$WTFINL, family = binomial(link = 'logit')) ``` --- # Question 2 Results: Model 1  --- # Question 2 Results: Model 2 - Results are low Impact, which gives reason to look at time based interactions  --- # Interpretation of Selected Model - Model #3: Interpreting Logit Interaction Terms - Utilized Inter Effect Code: - summary(predictions(ques2logit2, newdata = datagrid(retail_or_not= 1, covid_april = 1))) - First Scenario indicates that for retail workers before April 1st the risk of unemployment was 2.3%, meaning if we only had 1000 retail workers pre pandemic, we would expect 23 of them to be unemployed. - Second Scenario if everyone was a retail worker after April 1st, we would anticipate an unemployment rate of 4.85% - Third scenario, if no one was a retail employee, before April 1st we would anticipate having an unemployment rate of 1.42%. - Fourth summary shows that non retail workers post April 1st would anticipate an unemployment rate of 3.09%, which is lower than for retail workers. --- # Question 3 - Retail needs to worry about who has money to spend - what has changed about who is working and earning money? --- # Why are we running these analyses? - The labor force is still very much white, male, and between 20 and 60 years of age - However the demographics are changing slightly: - Leaning slightly more female than before - Leaning slightly younger than before  --- # Why are we running these analyses? - By looking at who is employed at each point in the pandemic we can see how demographics are shifting in the labor force. - Using income and employment separately might help show small differences in not just who is working but who has income that they can then spend. --- # Question 3 Assumptions - People leaving the labor force stop reporting their income so as people are laid off the average income is going to shift up as lower earners are lost from the data set. - The variables created to mark the various waves of Covid identify a general enough time frame that they apply to every region regardless of when that region actually experienced the wave. - A binary variable to determine employment is adequate for these regression models to be accurate enough for our purposes. --- # Question 3 Regression Testing - Model 1: ```r q3_m1 <- feols(Employed ~ race_dummy + student_dummy + AGE + sex_binary + covid1_dummy + covid2_dummy + covid3_dummy, data=q3_df) ``` - Model 2: ```r q3_m2 <- feols(IncNumber ~ race_dummy + student_dummy + AGE + sex_binary + covid1_dummy + covid2_dummy + covid3_dummy, data=q3_df) ``` - Model 3: ```r q3_m3 <- feols(Employed ~ all_covid + AGE + race_dummy + sex_binary + I(all_covid*AGE)+ I(all_covid*race_dummy) + I(all_covid*sex_binary), data=q3_df) ``` - Model 4: ```r q3_m4 <- feols(IncNumber ~ all_covid + AGE + race_dummy + sex_binary + I(all_covid*AGE) + I(all_covid*race_dummy) + I(all_covid*sex_binary), data=q3_df) ``` --- # Question 3 Results: Models 1 and 2 - Models 1 and 2 did not include interaction terms and show that overall the demographics of the labor force leaned younger, whiter, and male. They also show that Covid played a negative on employment as expected. ```r etable(q3_m1, q3_m2) ``` ``` ## q3_m1 q3_m2 ## Dependent Var.: Employed IncNumber ## ## (Intercept) 0.8476*** (0.0009) 75,490.2*** (93.01) ## race_dummy 0.0242*** (0.0006) 8,892.2*** (57.11) ## student_dummy -0.5231*** (0.0009) 754.1*** (84.96) ## AGE -0.0056*** (1.65e-5) -63.14*** (1.625) ## sex_binary -0.1411*** (0.0005) -2,473.9*** (45.41) ## covid1_dummy -0.0508*** (0.0013) 3,281.4*** (125.9) ## covid2_dummy -0.0615*** (0.0008) 3,221.3*** (76.44) ## covid3_dummy -0.0198*** (0.0006) 1,612.4*** (62.65) ## _______________ ____________________ ___________________ ## S.E. type IID IID ## Observations 4,196,893 4,114,472 ## R2 0.10402 0.00769 ## Adj. R2 0.10402 0.00769 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- # Question 3 Results: Models 3 and 4 - Models 3 and 4 did include interaction terms and show that within the pandemic the demographics changed very slightly away from white and male toward other groups. ```r etable(q3_m3, q3_m4) ``` ``` ## q3_m3 q3_m4 ## Dependent Var.: Employed IncNumber ## ## (Intercept) 0.5963*** (0.0012) 73,318.2*** (114.5) ## all_covid -0.0301*** (0.0017) 6,653.5*** (164.4) ## AGE -0.0009*** (2.13e-5) -52.31*** (2.011) ## race_dummy 0.0275*** (0.0008) 9,424.2*** (79.70) ## sex_binary -0.1527*** (0.0007) -2,519.1*** (63.01) ## I(all_covid x AGE) -9.79e-5** (3.07e-5) -36.99*** (2.890) ## I(all_covid x race_dummy) 0.0039** (0.0012) -1,018.4*** (114.2) ## I(all_covid x sex_binary) 0.0110*** (0.0010) 134.0 (90.79) ## _________________________ ____________________ ___________________ ## S.E. type IID IID ## Observations 4,196,893 4,114,472 ## R2 0.02417 0.00926 ## Adj. R2 0.02417 0.00926 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- # Interpretation: - In general the labor force is made up of the same demographics. - One important distinction is the slight increase in employment for women and non-white people relative to the period of time before March of 2020. --- # Thank You - We enjoyed this class and this project - Congrats on winning Idiotest and